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Subspace-Sparse Representation 

Chong You and Rene Vidal 


Abstract —Given an overcomplete dictionary A and a signal 
b that is a linear combination of a few linearly independent 
columns of A, classical sparse recovery theory deals with the 
problem of recovering the unlqne sparse representation x snch 
that b = Ax. It is known that nnder certain conditions on A, x 
can be recovered by the Basis Pursuit (BP) and the Orthogonal 
Matching Pnrsnit (OMP) algorithms. In this work, we consider 
the more general case where b lies in a low-dimensional subspace 
spanned by some colnmns of A, which are possibly linearly 
dependent. In this case, the sparsest solution x is generally not 
nnique, and we stndy the problem that the representation x 
identifies the subspace, l.e. the nonzero entries of x correspond to 
dictionary atoms that are in the snbspace. Snch a representation 
X is called subspace-sparse. We present sufficient conditions 
for guaranteeing subspace-sparse recovery, which have clear 
geometric Interpretations and explain properties of snbspace- 
sparse recovery. We also show that the sufficient conditions can 
be satisfied nnder a randomized model. Onr results are applicable 
to the traditional sparse recovery problem and we get conditions 
for sparse recovery that are less restrictive than the canonical 
mntual coherent condition. We also use the results to analyze 
the sparse representation based classification (SRC) method, for 
which we get conditions to show its correctness. 

I. Introduction 

S PARSITY has played an important role in the area of 
signal processing for the past few years. Given an over¬ 
complete dictionary A G , consider the sparse pursuing 

program: 

min||x||o s.t. b = Ax, (1) 

X 

in which || • ||o counts the number of nonzero entries. Sparse 
representation concerns about the uniqueness of the solution 
and how the solution can be recovered efficiently |[T], ||2], |T|. 
Since solving ([T]) is generally intractable computationally, it 
is usually approached by some approximate algorithms such 
as Orthogonal Matching Pursuit (OMP) 0 and Basis Pursuit 
(BP) 0 . There has also been studies of these algorithms and 
the results show that if A is sufficiently incoherent 0, Q, 
or satisfies the so-called restricted isometry property m, 
Qo), im. Km, ini, m, then the true sparsest solution can 
be found by these approximate algorithms. 

In this work, we consider an extension of the canonical 
sparse recovery to the cases where the dictionary A is not 
necessarily incoherent. Let .A = {a^, j G J"} be the set of all 
columns of A in problem ([T]l, where J' = {I,-- - ,J}. We 
consider the case that the dictionary A is subspace-structured, 
i.e., there is a set Jo ^ J such that Ao := {&j,j G Jo} 
spans a low dimensional subspace, denoted as So- In this 
case the dictionary is not necessarily incoherent, e.g., two 
atoms in Ao could be arbitrarily close or even be identical. 
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Moreover, for any b G 5o, the solution to ([Til is generally not 
unique, since one can get equal sparsity solutions by using 
any do atoms from Ao, where do dim(iSo). The goal in 
this case is not to recover any specific one of these solutions; 
observe that all of them have the property that they represent 
b using atoms only from Ao, we study whether the solution 
to dU has such a general property. A solution that satisfies this 
property is called subspace-sparse. Similar to sparse recovery, 
in the subspace-sparse recovery problem we study whether the 
approximate algorithms such as OMP and BP give subspace- 
sparse representations. 

The term of subspace-sparse representation is proposed in 
insi, and such a representation is also called to be subspace¬ 
preserving M, or called to satisfy the subspace-detection 
property lIlTl, or called to have exact feature selection CD 
in general non-sparse contexts. The concept plays a key 
role in analyzing subspace-structured data for the tasks of 
classification ^M, M, im and clustering ||22, ES], ll2^ . 
Il25l . E6l . Il27l . fm . Il29l . lOOl . ED, EH, with applications 
to face recognition, motion segmentation, video segmentation, 
etc. The idea has also intrigued new methods with applications 
to visual object tracking ES, 04l . action recognition ESl . 
ESI, subset selection ED, and so on. 

Following the initial work of ED, several recent works ESl . 
ED, HD, ED, ED, HD have studied the subspace-sparse 
recovery problem in the context of subspace clustering, where 
the task is to cluster a collection of points lying in a union of 
subspaces. In this case, the problem is solved by first finding 
a subspace-sparse representation of each point in terms of a 
dictionary composed of all other points and then applying 
spectral clustering to these subspace-sparse representations. 
Notice, however, that these analyses are specific for the 
correctness of subspace clustering. In this work we study the 
more general subspace-sparse recovery problem, where the 
signal to be represented is an arbitrary point in the subspace 
So, and the goal is to derive conditions on the dictionary 
under which the OMP and BP algorithms are guaranteed to 
give subspace-sparse solutions. Based on the analysis, we also 
obtain new theoretical conditions for classical sparse recovery 
and sparse representation based classification. 

A. Problem formulation and relation with sparse recovery 

Given a dictionary A = {a.j G G J}, suppose that 

there is a partition J = Jo ^ Jc, such that Ao ■= {aj,j G 
Jo} contains points that are in a subspace So ■= span(^o) 
of dimension do < D, and Ac '■= {aj,j G Jc} contains 
points that are not in the subspace 5o. For an arbitrary point 
b G So, by applying the BP or the OMP algorithm to b with 
dictionary A, we can get a sparse vector x such that b = 
Ax. The problem of subspace-sparse recovery is to study the 
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conditions on the dictionary A under which the representation 
X is subspace-sparse, i.e. Xj ^ 0 only if j G Jq. We also 
assume that all atoms in dictionary A are normalized to have 
unit ^2 norm. 

Classical sparse recovery is a particular case of subspace- 
sparse recovery. Assume that there is an unknown vector 
X that is SQ-sparse (i.e. x has at most sq nonzero entries), 
sparse recovery studies the problem of recovering it from the 
measurement b = Ax by algorithms such as BP and OMR 
In order for this problem to be well posed, x needs to be the 
unique sparsest solution, thus the so atoms of A corresponding 
to the So nonzero entries of x must be linearly independent. 
On the other hand, if we assume that the set Aq contains 
So := card(Ao) linearly independent points in the subspace- 
sparse recovery problem formulation, then the subspace-sparse 
solution is unique for any b G iSp. In such cases, the conditions 
for guaranteeing subspace-sparse recovery also guarantees 
sparse recovery of any so-sparse vectors. 

B. Results and Contributions 

We summarize our major subspace-sparse recovery results, 
which is discussed in detail in sections UniandllV] 

Theorems [T] and | 2 ] introduce, respectively, the principal 
recovery condition (PRC) and the dual recovery condition 
(DRC) for subspace-sparse recovery. Both of them are con¬ 
ditions on the dictionary A under which both OMP and BP 
give a subspace-sparse solution for every b G 5o. 

The PRC requires that 

7o<s(Ac,5o), (2) 

where the left hand side, 70, is the covering radius of the points 
Ao, which is defined as the smallest angle such that any point 
in the subspace Sq is within angle 70 of at least one point in 
Ao- Covering radius measures how well distributed the atoms 
Ao are in the subspace 5o, and should be relatively small if 
the points are equally distributed in all directions within the 
subspace and not skewed in a certain direction. The right hand 
side, s(Ac,iSo), is the minimum angle between any atom in 
Ac and any point in the subspace So- It is large when all pairs 
of points from the two sets are sufficiently separated. Thus, 
intuitively, the PRC requires the atoms Ao to be sufficiently 
well spread-out and the atoms Ac to be sufficiently away from 
the subspace So- 

The PRC has the drawback that 5o on the right hand side 
contains infinitely many points, making the requirement too 
strong. We show that a finite subset of the points in iSq is 
sufficient for this purpose, leading to the DRC: 

7o < s(Ac,T>o). (3) 

where Vo is a finite subset of the points in the subspace So, 
which will be defined in Section III-CI The DRC does not 
require all points in subspace 5o to be away from the atoms 
in Ac, as done by the PRC. Instead, only a finite number of 
points Vo are sufficient for all the points in iSq. Hence, the 
DRC is implied by the PRC, thus it gives a stronger result. 

In Theorem |9] we show that the DRC can be satisfied 
under a probabilistic model. Assume that the atoms in Aq are 


independently and uniformly distributed on the unit sphere of 
subspace iSq, and atoms in Ac are independently and uniformly 
distributed on the unit sphere of the ambient space M.^, then 
under the condition that 2 < do < y/D/2, the DRC is satisfied 
with a probability p that 1) is an increasing function of D, 
2) is a decreasing function of d and 3) goes to 100% as 
we increase card(j7o) to infinity while fix card(^7c)/card(J7o). 
This says that BP and OMP works better for subspace-sparse 
recovery with low subspace dimension relative to high ambient 
dimension and for densely sampled dictionary. 

C. Applications 

In section IV-AI we show that our results of subspace-sparse 
recovery can be applied to the analysis of the traditional sparse 
recovery problem. The results will be new conditions on a 
dictionary that can guarantee exact sparse recovery of any s- 
sparse vector by BP and OMP. We discuss how this condition 
can be computed, as well as its relation with the traditional 
mutual coherent condition. 

We then discuss in section IV-BI the method of Sparse 
Representation based Classification (SRC) mi. This method 
was first proposed for the task of face image classification, in 
which one is given several aligned face images for each of 
the several subjects, and the task is to classify any query face 
image that belongs to one of these subjects. The rationale is 
that for a Lambertian object, the set of all images taken under 
varying lighting conditions can be well approximated by a 
low dimensional subspace. Thus, it is proposed in ifT^ that 
one uses all the labeled images of all subjects as a dictionary 
and find a sparse representation of any query image using this 
dictionary, and the class label is assigned to the group that 
corresponds to the position of the nonzero entries. The method 
is generally viewed as an application of spare representation, 
but it lacks a theoretical justification and there has been 
discussions and doubts about its effectiveness Hi, m, ia, 
||43]| . In this work, we analyze SRC from the perspective of 
subspace-sparse recovery, and provide an analysis for it based 
on our results. 

II. Background 

The purpose of this section is to introduce background for 
understanding the main results of the paper. We first briefly 
review the OMP and BP methods for completeness. We then 
define geometric quantities for charactering the dictionary A 
and talk about their basic properties. 

A- Algorithms 

OMP and BP are two methods for sparse recovery. For a 
dictionary A and a signal b, consider the problem 

argmin||x||o s.t. Ax = b 

X 

OMP is a greedy method that sequentially chooses one 
dictionary atom in a locally optimal manner. It keeps track of a 
residual at step k, initialized as the input signal b, and a set 
Wk that contains the atoms already chosen, initialized as the 
empty set. At each step, Wk is updated to Wk+i by adding the 
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dictionary atom that has the maximum absolute inner product 
with Vfe. Then, is updated to v^+i by setting it to be the 
component of b that is orthogonal to the space spanned by 
atoms indexed by Wk+i- The process is terminated when a 
precise representation of b is established, i.e., when = 0 
for some k. 

BP is a convex relaxation approach. The idea is to use the 
£i norm in lieu of the £q norm, i.e., solve for 

b) := argmin ||x||i s.t. Ax = b. (4) 

X 

It has the benefit that (|4| is convex and can be solved more 
efficiently. We will denote the objective value of P{A,h) by 
p(A, b), and by convention, p{A, b) = +oo if the problem is 
infeasible. The dual of the above optimization program is 

b) := argmax(a;, b) s.t. ||A^a;||oo < 1- (5) 

to 

Let d(A, b) be the objective value of the dual problem 
D{A, b). If the primal problem is feasible, then strong duality 
holds, i.e., p{A,h) = d{A,h). 

B. Sphere and spherical distance 

The spherical distance is defined as the angle between two 
points in a space ]R^\{0}. 

Definition 1 (Spherical distance). The spherical distance 
s(v, w) of two points v, w G |q| (g defined as 


The spherical distance is in the range of [0, tt]. For notational 
convenience, we allow one or both operands of s(-,-) to be 
sets, in which case the spherical distance is taken to be the 
infimum of all parrs of points, i.e., for any V C Rf, W C K^, 

s(V,>V) := inf inf s(v,w). 
vGV\{0} wGW\{0} 

Let := {v G Rf : ||v ||2 = 1} be the set of unit vectors 
in Rf. It is known that s(-, •) defines a metric on ma. 

C. Geometric characterization of the dictionary 

The deterministic subspace-sparse recovery conditions rely 
on geometric properties of the dictionary A that characterize 
the distribution of the atoms in Ao and the separation between 
atoms in Aq and Ac- We first introduce the concept of covering 
radius. 

Definition 2 (Covering radius). Given the space with 

metric s{-, •), the (relative^) covering radius of a set of points 
V C is defined as 


the points in V are distributed, without leaving a large patch 
of empty region unfilled by any point. 

Using this concept, the distribution of the atoms Ao is 
characterized by the covering radius of the set of symmetrized 
points ±Ao := {±a.i,i G Jb}- We will use the simplified 
notation 70 := 7(±v4o). Intuitively, if 70 is small, then there 
are enough sample points in subspace 5o, and it should be 
expected that subspace-sparse recovery should be easier. 

Denote /Co := conv(±Ao), where conv(-) is the convex hull 
of a set of points. It can be identified as a symmetric convex 
body defined below. 

Definition 3 (Symmetric convex body). A convex set V that 
satisfies V = —V is called symmetric. A compact convex set 
with nonempty interior is called a convex body. 

Definition 4 (Polar Set). The (relative^) polar of a set V is 
defined as 7^° = {v G span{V) : (v, w) < 1, Vw G V}. 

By this definition, the polar set of /Co is given by /Cfj := 
{v G (So : |(v, aJI < 1,V* G Jo}- Specifically, /Cg is also a 
symmetric convex body, as the polar of a convex body is also 
a convex body B31 . 

A subset of the points in /Cg will play a critical role. 

Definition 5 (Extreme Point). A point v in a convex set V is 
an extreme point if it cannot be expressed as a strict convex 
combination of two other points in V, i.e., there are no X G 
(0, 1), vi, V 2 G P, Vi ^ V 2 , such that v = (1 — A)vi + Av 2 . 

Definition 6 (Dual Point). The set of dual points of the set 
Aq, denoted by Pq, is defined as the set of extreme points of 
the set /eg. 

A geometric illustration of some of the definitions is 
provided in Figure |l(a)| In the following, we discuss some 
relevant properties for understanding of the concepts and for 
later use. 

The following result shows that the set /Cg is bounded in 
terms of the covering radius 70 . The intuitive justification is 
that if 7 o is small, then the points Ao are dense on the unit 
sphere, so the polar set /Cg should be smaller. 

Lemma 1. Given Aq, assume that ||ai ||2 = l,Vi G Jo- It has 
max{||v ||2 : v G /Cg} = 1/cos 70 . 

The following result shows that the dual set Dq is finite. 
Essentially, the dual set is composed of the vertices of the 
polar set /Cg. 

Lemma 2. Given any Ao, the set Pq is finite. Specifically, 

card{Po)<2<^^ ( 6 ) 

in which sq = card{Ao), do = dim(iSo). 


7 (V) := max{s(V, w) : w G span{V) fl ^}. 

Intuitively, given a set of points V, we find a point on 
the unit sphere of span(V) that is furthest away from all the 
points in V. The name of covering radius also suggests another 
interpretation, that is, it is the smallest radius such that closed 
balls of that radius centered at the points of V covers all points 
in n span(V). Thus, this concept characterizes how well 


Moreover, all points in /Cg are convex combinations of 
these finitely many dual points in Pq. This is implied by the 
following stronger result. 

Lemma 3 (1451). The set of the extreme points of a convex 
body V is the smallest subset ofV with convex hull V. 

^It is more convenience to work with the relative quantities in covering 
radius and polar set since the data Aq are in a subspace. 
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(b) Geometry in 3D 


Fig. 1. Illustration of the geometry of subspace-sparse recovei'y. Dictionary atoms are .Ao := (drawn in blue) that lie on the unit circle (drawn in 

black) of a two-dimensional subspace So- Left: illustration of definitions for characterizing where the red dots are the dual points. Right: illustration of 
the geometry of PRC and the DRC, see text for details. 


III. Subspace-Sparse Recovery: 
Deterministic Result 

In this section, we discuss the theories of subspace-sparse 
recovery. We start by formally introducing and highlighting 
the two conditions, PRC and DRC, for guaranteeing the 
correctness of both OMP and BP for subspace-sparse recovery, 
then go into details the study of BP and OMP separately. 

A. Subspace-sparse recovery conditions 

Let BP(yl, b) and OMP(yl, b) be the (sets of) solutions 
given by the two algorithms. We present conditions under 
which the solutions BP(^, b) and OMP(yl, b) are subspace- 
sparse for all the b in the subspace Sq. Concretely, we identify 
the following two conditions for our objective. 

Definition 7. A dictionary A = 0 -A-c is said to satisfy the 

principal subspace-sparse recovery condition (PRC) if 

7o < s(,Ac,5o), (7) 

in which 70 is the covering radius of ±7lo o-nd Sq is the span 
of Aq- It is said to satisfy the dual subspace-sparse recovery 
condition (DRC) if 

yo < s{Ac,Do), (8) 

in which Dq is the set of dual points of Aq- 

The results for subspace-sparse recovery are as follows. 

Theorem 1. If PRC is satisfied by a dictionary A = Ao UAc, 
then BP(A,h) and OMP{A,h) are both subspace-sparse for 
all b S So- 

Theorem 2. If DRC is satisfied by a dictionary A = AoUAc, 
then BP(A,h) and OMP(A,h) are both subspace-sparse for 
all b G (So- 

As both theorems show, two major factors affect subspace- 
sparse recovery. The first is to have the atoms indexed by Jo 
to be well spread-out across the subspace 5o, as measured by 


the covering radius on the left hand side of (|7|l and The 
second factor is that the atoms in Ac should not be too close 
to points in So in the case of PRC or points in Do in the 
case of DRC. Furthermore, note that PRC requires atoms in 
Ac to be away from all points in the subspace iSq. The DRC, 
however, is a weaker requirement since it only needs atoms in 
Ac to be away from Dq, a finite subset of 5o. Thus, Theorem 
[U is implied by Theorem ID 

Both PRC and DRC have clear geometric interpretations. 
Figure |l(b)| gives an illustration, in which we show the case 
of a two dimensional subspace iSq in Note that by our 
assumption, all the atoms of A are on the unit sphere shown 
in the figure. The dictionary Ao and the dual points Do are 
illustrated in blue and red, respectively, see also Figure |l(a)| 
for an illustration in the 2D plane of the subspace iSq. The two 
solid green circles have latitude ±70 on the unit sphere, they 
illustrate PRC: the PRC holds if and only if the atoms Ac are 
such that they do not lie in the region enclosed by these two 
circles (i.e., they all have latitude larger than 70 or smaller 
than — 70 )- The DRC is illustrated by the yellow region which 
is composed of a union of the yellow circles in the space S^. 
Each circle is centered at a normalized dual point (note the red 
dots illustrate the unnormalized dual points) with radius 79 . It 
can be seen that the DRC holds if and only if no point from 
Ac lies in the yellow region. This interpretation generalizes 
to any subspace dimension do and ambient dimension D, in 
which case the PRC and DRC essentially give regions on the 
unit sphere for which the atoms in Ac should not reside 
in. In section |IV] we will revisit this geometric interpretation 
and analyze under a randomized model the parameters that 
affect the area of these regions. 

The two deterministic results in Theorem[T]and|2] alongside 
with some auxiliary results, are summarized in Figure |2] Each 
box contains a proposition, and the arrows denote implication 
relations. The topmost and the bottommost boxes are the 
properties of subspace-sparse recovery by BP and OMP that 
we are pursuing. Both of them are implied by the PRC and 
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Fig. 2. Summary of the results of subspace-sparse recovery with dictionary A = Aq UAc- Each box contains a proposition, and arrows denote implications. 
The topmost (resp., bottommost) box is the property of subspace-sparse recovery by BP (resp., OMP). Two major conditions for subspace-sparse recovery 
are the PRC and the DRC. 


the DRC. In the following, we give proofs for Theorem [T] and 
| 2 ] while at the same time discuss in more detail theories of 
subspace-sparse recovery by BP and OMP, respectively. 

B. Subspace-sparse recovery by BP 

We first establish an equivalent condition for subspace- 
sparse recovery from BP, then show that this condition is 
implied by PRC and DRC. See the upper half of Figure |2] 
for an illustration. 

1) An equivalent condition: There is an equivalent con¬ 
dition for BP to give subspace-sparse solutions. The result 
appears in the context of subspace clustering m and we 
rephrase the result here for our problem and omit the proof. 

Theorem 3. 4751/ BP{A, b) is subspace-sparse for all b £ 5o 
if and only ifp{Ao, b) < p{Ac, b) for all b G 5o \ {0}. 

In the equivalent condition, it is required that for any b £ 
iSo \ {0}, p(^O)b), which is the objective value of BP for 
recovering b by dictionary Ao (see (|4|), should be smaller 
than p(.Ac,b), which is the objective value of recovering by 
dictionary Ac- 

2) The PRC result: We proceed to discuss how PRC 
guarantees subspace-sparse recovery by BP. As noted, the PRC 
implies the DRC, so the PRC result is trivially proved once 
we show proof for DRC. In the following, we present a direct 
proof that PRC implies the equivalent condition established 
in Theorem [3 as it bears a clearer understanding of PRC for 
subspace-sparse recovery by BP. 

In the equivalent condition, notice that b is an arbitrary point 
in 5o, so the LHS p(.Ao,b) depends purely on the properties 
of Aq, while RHS p{Ac,h) depends on a relation between 
the atoms Ac and the subspace Sq. This enlightens us to 
upper bound the former by characterization of Sq, and to lower 
bound the latter by the relation of Sq and Ac- 

Theorem 4. If PRC: 70 < s(.4c,5o) holds then Vb £ 5c \ 
{0},p(.Ao,b) < p{Ac,h)- 

Proof: We bound the left and right hand sides of the 
objective inequality separately. 


First, notice that p{AQ,h) = c/(v4o,b) = (a;,b) by strong 
duality, in which w is dual optimal solution. Decompose oj 
into two orthogonal components uj = w*-in which £ 
Sq, it has ||A([a;ll ||2 = ||Aj[a ;||2 < 1, where Aq is a matrix 
composed of atoms in Aq as columns. Thus, by definition of 
the polar set, £ /Cg. One can then use Lemma |2] and get 

p(A,b) = (wll,b) < ||b||2||a;ll||2 < ||b||2/COS70. ( 9 ) 


On the other hand, consider the optimization problem 


P(Ac, b) = argmin ||x||i s.t. AcX = b, (10) 

X 

where Ac is a matrix composed of atoms in Ac as columns. 
If the problem is infeasible, then the objective of the above 
optimization p{Ac, b) = -foo, the conclusion follows trivially. 
Otherwise, take any x* £ P{Ac, b) to be the optimal solution, 
we have b = AcX*. Left multiply by b^ and manipulate the 
right hand side we have the following; 

||b||2 = b^AcX*<||A:b|U||x*||i 

= l|Aj||^||oo||b||2-p(Ac,b) (11) 

< coss(Ac,5o) • ||b ||2 •p(Ac,b), 

so p{Ac,h) > ||b||2/s(Ac,5o). 

The conclusion thus follows by combining (|9|l and (fTTT l and 
the condition of PRC. ■ 

3) The DRC result: To prove that DRC implies subspace- 
sparse recovery by BP, we need a statement that is weaker than 
DRC but is more convenient to work with, see the rightmost 
box of Figure |2l 

Lemma 4 . If DRC: 70 < s(Ac, Dq) holds then it has 
||Ajv||oo < l,Vv £ Do- 


Proof: For any v € Dq, we know that v £ /Cg. Thus, 
we can use Lemma |2] to bound v as ||v ||2 < I/COS 70 . 
Consequently, 
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Theorem 5. If ||Ajv||oo < l,Vv G Vq holds then Vb G 
So \ {0}, p{Ao, b) < p{Ac, b). 

Proof: To prove the result, we need some basic results 
from linear programming. Consider the linear program: 

argmax(w,b) s.t. ||Ag a;||oo < l,a; G iSq. (13) 

W 

Note that the feasible region of (fOT l is /Cg, and it is bounded 
because it is a convex body. By theories of linear programming 
(e.g., HU), there must have a solution to Cl that is an 
extreme point of /Cq. Thus, we can always find a solution 
of d that is in the set of dual points Vq. 

Now let us consider the optimization problem Z?(Ao,b), 
rewritten below for convenience: 

T)(Ao, b) := argmax(w, b) s.t. ||Aga;||oo < 1- (14) 

UJ 

Note that this program differs from d only in the con¬ 
straint. The claim is, despite of this change, there is still at 
least one optimal solution to (fl4li that is in Vq. This follows 
from the fact that both b and the columns of Aq are in Sq, 
thus any solution u) to (fl4li can be decomposed into two parts 
as w = wll + w*-, in which wH is a solution to d and w*- is 
orthogonal to Sq. 

Prepared with the above discussion, we now go to the proof. 
The proof is trivial if p(Ac, b) = +oo, since p(Ao, b) always 
has feasible solutions and thus is finite. 

Otherwise, take any x* G P(Ac,b) to be a primal op¬ 
timal solution. It has that b = AcX*. On the other hand, 
we have shown that there exists an optimal dual solution 
w* G D{AQ,h) that is in Vq. Thus, 

p{Aq, b) = d(Ao, b) = (w*, b) = (w*, A^x*) 

< ||A:cc*|U-||x*||i <p(Ae,b), 

in which ||Aja;*||oo < 1 by assumption, and ||x*||i = 
p(Ac, b) since x* is an optimal solution. ■ 

C. Subspace-sparse recovery by OMP 

The lower half of Figure |2] summarizes the results for sparse 
recovery by OMP. The results surprisingly have a symmetric 
structure as that of BP. First, we show an equivalent condition 
for subspace-sparse recovery by OMP. Then we show that this 
condition is implied by PRC and DRC. 

1) An equivalent condition: 

Theorem 6 . Vb G iSq, (9MP(A, b) is subspace-sparse if and 
only i/Vb G 5o \ {0}, s(Ao, {±b}) < s{Ac, {±b}). 

Proof: The “only if” part is straight forward because if 
s(v4o, {±b}) > s{Ac, {±b}), then this specific b will pick a 
point from Ac in the first step of the OMP(Vl, b). 

The other direction is also easily seen in an inductive way if 
we consider the procedure of the OMP algorithm. Specifically, 
for any given b G 5o, the first step of OMP(A, b) chooses an 
entry from Aq, and this gives a residual that is again in Sq, 
which then guarantees that the next step of OMP(Vl, b) also 
chooses an entry from Aq. ■ 

Thus, the equivalent condition requires that for any point 
b G (So \ {0}, the closest point to either b or —b in the entire 
dictionary A should in Aq. 


2) The PRC result: Similar to the discussion for BP, the 
term s(Vlo, {±b}) on the LHS of the equivalent condition de¬ 
pends on Aq and can be upper bounded by the characterization 
7 o, and the term s{Ac, {±b}) depends on relation between Sq 
and Ac and can be bounded below. 

Theorem 7. If PRC: 70 < s{Ac,Sq) holds then Vb G iSq \ 
{ 0 }, s(Vlo, {±b}) < s{Ac, {±b}). 

Proof: We prove this by bounding each side of the 
objective inequality separately. 

For the left hand side, notice 70 := 7 (±Vlo), then by 
definition of covering radius, 70 > s(Ao,{±b}). 

For the right hand side, we have s{Ac, {±b}) > s(Vlc,5o) 
by definition of the notation s(-, •)■ 

The conclusion thus follows by concatenating the bounds 
for both sides above with the PRC. ■ 

3} The DRC result: Finally, we prove the result for DRC, 
by showing that the statement in the rightmost box of Figure 
|2] guarantees the equivalent condition for OMP. 

Theorem 8. If ||Ajv||oo < l,Vv G Vq holds then Vb G 

\ {0}, s(Ao, {±b}) < s{Ac, {±b}). 

To prove this theorem, we use the result that the polar set 
K.Q induces a norm on the space Sq, by means of the so-called 
Minkowski functional. 

Definition 8 . The Minkowski functional of a set K, is defined 
on spanfC) as 

||v||ac = inf{f > 0 : y G/C}. (16) 

Lemma 5. 4471/ If K, is a symmetric convex body, then II • Ik 
is a norm on span(lC) with K. being the unit ball. 

By this result, || • ||)cg is a norm on iSq since /Cg is a 
symmetric convex body, see the discussion for Definition |4] 
Proof of Theorem UJ It suffices to prove the result for 
every b G 5o \ {0} that has a unit norm, by using any 
norm defined on 5o. Here the norm we use is the Minkowski 
functional || • ||a:“, and we need to prove that s(Vlo, {±b}) < 
s(Vlc,{±b}) for all b G 5o such that ||b||x;g = 1- 

Since ||b||)cg = 1, it has b G /Cg, by Theorem [3 thus b 
could be written as a convex combination of the dual points, 
i.e. one can write b = Xi • in which G Vq, Xi G [0,1] 
for all i and — 1- Thus, 

||Ajb||oo = IIAJ^v, -a^illoo < ||Ajv, -a^illoo 

i i 

= l=||A([b||oo, (17) 

i 

in which the last equality follows from ||b||A:g = 1. One then 
divide both sides of (fTTI i by ||b ||2 and take arccos, and the 
conclusion can be easily seen. ■ 

IV. Subspace-Sparse Recovery: 

Randomized Result 

In this section, we discuss the properties of subspace-sparse 
recovery under a randomized model. The analysis is built upon 
the deterministic condition of DRC in Section [HI] We show 
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that under a certain randomized modeling of data, the DRC 
can be satisfied with certain probabilities. The roadmap of 
proof of the result is provided. 


A. Main result 

Theorem 9. Let A = {aj S J & J} be a dictionary such 
that Ao contains sq points randomly and uniformly sampled on 
the unit sphere of some subspace Sq with dimension do < D, 
and Ac contains points randomly and uniformly sampled on 
the unit sphere Let po = sg/do be the “density” of 

points in Sq, let A = card{Ac)/S q- Under the conditions that 
2 < do < yD/i and po > 1, the DRC is satisfied with 
probability 


p> I — 


do • 2'^“ 

C{D,d^) 


s/Po- 


,-C{D,do)VP^ _ 


Xdo{2eY° 
(po) 


fco 


(18) 


in which ko = — do, C{D,do) is increasing in D, 


decreasing in do and lower bounded by 0.79-\/do/2.07‘^” 


d„-l 


This theorem asserts that if the dictionary A is generated 
under this random model and satisfies the condition on do, 
then both BP and OMP give subspace-sparse recovery for any 
point b G 5o with a probability specified in the theorem. The 
condition that do ^ 1 is an artifact introduced by the technique 
of the proof; one can easily see that if do = 1 then subspace- 
sparse representation can be recovered with probability iQ. 

Notice that the condition D > 2do^ requires D to be large 
and do to be small, and as long as the condition is satisfied, the 
guaranteed probability of success also increases as D increases 
and as do decreases (for large enough po). This conforms with 
the previous observations that the subspace-sparse recovery 
works better in cases of low dimensional subspace in high 
dimensional ambient space El. Moreover, the probability 
is a decreasing function of A, showing that subspace-sparse 
recovery becomes harder if more points are added to Ac- 
Finally, the probability goes to 100% as the sample density 
Po goes to infinity, thus one can achieve arbitrary confidence 
in getting subspace-sparse recovery by increasing the number 
of samples to be large enough. 


B. Geometric interpretations 

We continue the discussion of the geometric interpretations 
of DRC in Section |III] and analyze the factors that affect 
the geometry of the problem under the randomized model in 
Theorem |9l 

We first introduce some definitions. Recall that we use the 
notation = {v G : ||v ||2 = 1 } to denote the unit 

sphere. Denote Up-i to be a uniform area measure on 
For a given w G and a d G [0, tt], the spherical cap is a 
subset of which is defined as 

= {v G s(w, v) < d}. (19) 

By this definition, each yellow circle in Figure |l(b)| is a 
spherical cap S;^“^(w),w G Do, and the DRC requires that 
the points in Ac do not lie in the union of these spherical caps. 

*The proof is left as an exercise. 


With a random sampling of points in Ac, the chance that DRC 
is satisfied is determined by 

a77_i(UwG27oS7o”^(w)) 

which is the area of the spherical caps relative to the area of 
Obviously, if the quantity in (l20l i is smaller then the 
DRC is easier to be satisfied. 

Consider increasing D with all other parameters fixed in 
the randomized model of Theorem |9] Note that the number 
and the radius of the spherical caps w G Dq are all 

statistically independent of D, so we consider card(T>o) and 70 
as fixed. It is known that the area of a spherical cap relative to 
the entire sphere, i.e. (T£)_i(S;^“^(-))/cr£)_i(§^“^) becomes 
smaller for higher dimension m Thus, as D increases, the 
yellow region given by DRC decreases, and DRC becomes 
easier to be satisfied. 

Consider now that D is fixed and do is varied. Intuitively, 
given a fixed number of points, it is easier to “cover” a 
lower dimensional the unit sphere In other words, the 

covering radius 70 decreases as do decreases. Thus, decreasing 
do has the effect of shrinking the yellow spherical caps in 
Figure |l(b)| making DRC easier to be satisfied. 


C. Roadmap of proof 

We provide a roadmap of proof for Theorem |9] This is 
achieved by providing probabilistic bounds on both sides of 
DRC separately. In the following, we start by presenting 
relevant geometric results. 

1) Preliminary geometric results: Let B^{r) := {v G R^ : 
||v ||2 < r} be a ball of radius 1 in space R^’. It is well known 
that its volume is computed in closed form, i.e., 

vol(i?^(r)) = Vp ■ r^, where Vp = /r(^ -f 1 ) ( 21 ) 

in which vol(-) denotes the volume, and r(-) is the Gamma 
function. 

Based on this, we can further estimate the area of the 
spherical cap defined in ( fT9l l by the following result. 


Lemma 6. For any 9 G [0,7r/2] and any p >2, 


Vp-l 


sin' 


p-i 


9 < 


crp_i(Sg ^(w)) ^ vp-i 


pVp <Jp-l{^ Vp 

in which Vp is defined in (EB. 


■ sinP"" 9, (22) 


Equipped with this result, one can give a probabilistic lower 
bound on the RHS of DRC as follows. 

2) A lower bound on RHS of DRC: Notice that according 
to the probabilistic model in Theorem |9] an arbitrary point 

V G Do and an arbitrary point in w G ,4c are independent. 
Moreover, the point w is uniformly distributed on the unit 
sphere, so the effect of the angle s(w,v) is as if holding 

V fixed and letting w as uniformly distributed on at 


^This is known as the phenomenon of concentration of measure, see, e.g. 
(48). This can also be seen from Lemma Co] which shows that the radio of 
area is upper bounded by sin^~^ 7 o ■ which goes to 0 as D 

increases to infinity. 













random. By using upper bound on the area of spherical cap 
in (|2^ . one can get for any 7 * G [ 0 , 7 r/ 2 ] that P(s(v,w) > 
7 *) > 1 — sin^“^ 7 *. One can then apply union bound 
on all pairs of points Vq x Ac- Notice card(Po) < (^°) ■ 2^^“ 
by Lemma |2] and card(^c) = A • Sq. we get 


4} Final proof: One can see that by combining ( |2^ and 
(l24l) . we can get a probability that 70 < s(Po: Ale) in terms of 
the parameter 7 * G ( 0 , ^]. The result in ( fTSl ) is subsequently 
acquired by taking a specific value of 7 *. The details are 
deferred to appendix. 


P{s{Vq,Ac) > 7*) > 1 - Aso • ■ -^^sin^ ^7*. 

V«o/ VD 

(23) 

We are left to give an upper bound on the LHS of DRC. 
Essentially, we need to give a probabilistic bound on the 
covering radius. 

3) An upper bound on covering radius: Given the unit 
sphere ^ and a positive integer M, we consider the problem 
that if there are M points independently and uniformly drawn 
from the sphere at random, how well-spread out they 

are in terms of covering radius. Intuitively, as more points 
are sampled, the unit sphere is expected to be better covered 
by the samples and the covering radius is expected to be 
smaller. In the following, we give a rigorous statement of this 
intuition and proofs are delayed to appendix. Our proof draws 
inspiration from the work 1291 . The idea is simple: assume 
that there is a set of circles of radius e on that can cover 
the entire unit sphere (i.e., an e-covering as defined below), 
if the M sample points are distributed on in a way that 
every small circle contains at least one sample point, then the 
covering radius can be bounded by 2 x e. Before discussing 
how this is realized, we first introduce two definitions. 


Definition 9. A set V C ^ is called an e-covering ofW> 1 
if the covering radius of V is no more than e. Given e > 0, 
the covering number denoted by C(S^“^,e) is the 

cardinality of the smallest e-covering of^~^. 

First, it is desirable to find an e-covering of with as 
small cardinality as possible. 


Lemma 7. The covering number of ^,p > 2 is bounded 

Vry ^ 


Given this, we further lower bound the probability that every 
circle in the e-covering contains at least one sample point, and 
the bound on covering radius can be obtained. 

Theorem 10. Let V C >2 be a set of K points that 

are drawn independently and uniformly at random on 
Then for any 7 * < 7 r/ 2 , it has 'y{±V) < 7 * with probability 
at least 1 —— - ^ * exp(—sin^“^ 

With this result, the LHS of DRC is upper bounded by the 
following. 


-P(7o < 7*) > 1 


dp ■ Vdo _ 1 


exp(-so-^^^^^2_Ll sin('^o 

Vdo 2 


(24) 


V. Applications 

In this section, we apply the theoretical results in the 
previous sections to the analysis of the traditional sparse 
recovery. In this process, we also establish the relation between 
the PRC/DRC and the mutual coherent condition in sparse re¬ 
covery. Moreover, we also discuss the application of our results 
to the analysis of sparse representation based recognition. 

A. Sparse Recovery 

In sparse recovery, the task is to reconstruct an so-sparse 
signal X (i.e. x has at most sq nonzero entries) from the 
observation b = Ax for some dictionary A. In order to 
analyze the problem by the subspace-sparse representation 
results, we take the set Jp to be the sp columns corresponding 
to the nonzero entries of x and get a partition of A into 
Ap LI Ac- If Alo has the property that its atoms are linearly 
independent, then x is the unique subspace-sparse solution. 
In this case, subspace-sparse recovery and subspace-sparse 
recovery are equivalent, in the sense that if one guarantees 
finding subspace-sparse representation, then correct sparse 
recovery can be achieved. Consequently, by using our PRC 
and DRC results, we can have the following result. 

Theorem 11. Given a dictionary A any sp-sparse vector 
X can be recovered from the observation b = Ax by 
BP and OMP if for any partition of A into Ap and Ac 
where card{Ap) — Sp, it has that atoms in Ap are linearly 
independent and that PRC (respectively, DRC) holds- 

This result serves as a new condition for guaranteeing 
reconstruction of sparse signals. Its geometric interpretation 
is the same as that of PRC and DRC for the subspace-sparse 
recovery, i.e., for any Sp atoms of the dictionary, they should 
be well distributed in their span, while all other atoms should 
be sufficiently away from this span (by PRC) or from a subset 
of the span (by DRC). 

For the purpose of checking the conditions of the theorem, 
if any sq atoms in A are linearly independent, then subsequent 
checking of the PRC and DRC is easy, as explained below. 
First, the dual points Dp can be written out explicitly: 

Lemma 8. For Ap which has sp linearly independent atoms, 
the set of dual points. Dp, contains exactly 2®° points 
specified by {Ao(A(]^Ao)“^ • u,u G C4o}> where llsg ■= 
{[Ui,--- ,Uso],Ui = ±1,Z = I,-- - ,So}. 

The proof is in the appendix. With the dual points, one 
can then compute s(Ac,5o) and s{Ac,Dp) on the RHS of 
PRC and DRC. Moreover, the covering radius 70 can also be 
computed by the relation in Lemma [T] i.e. 

COS 70 = l/max{||v ||2 : v G /Cg} 

= l/max{||v|l 2 : V G Do}, (25) 


for every 7 * G (0, |]. 
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where the last equality follows from the fact that Vq is the 
set of extreme points of /Cq. Thus, all terms in PRC and DRC 
can be computed. 

At the end of this section, we point out that the result of 
Theorem [TT] can be compared with traditional sparse recovery 
results. Specifically, we compare it with the result that uses 
mutual coherence, /r(^), which is defined as the largest 
absolute inner product between atoms of A. It is known that 
/^(-^) < 2 a ^-1 ^ sufficient condition for OMP and BP || 6 l, 

13 to recover so-sparse signals. We show that this is a stronger 
requirement than that of Theorem fTTl 

Theorem 12. If a dictionary A satisfies iu.{A) < 
for any partition of A into Aq and Ac where card{A{)) = Sq, 
it has that the atoms in Aq are linearly independent and that 
PRC and DRC hold. 

The proof is in the appendix. This result shows that the 
PRC/DRC conditions in Theorem [TT] are implied by the 
condition of mutual coherence. While the mutual coherence 
condition requires all atoms of A to be incoherent from each 
other, the PRC and DRC provide more detailed requirements, 
in terms of the distribution of points Aq as well as the relation 
of Aq and Ac- 


B. Sparse Classification 

We can use the deterministic and randomized results for 
subspace-sparse recovery for the analysis of the sparse rep¬ 
resentation based classification (SRC) method. Assume that 
we are given a dictionary A := {a.j,j C which contains 
data from a union of n subspaces, i.e., there exist a partition 
of J into Ji,--- ,Jn, such that any two different set Ji 
and Jj do not intersect and that CiJi = J, and that 
Ai := {aj,j G Ji] contains points from a low dimensional 
subspace Si. Following the notational tradition, we assume 
that the i-th group has Si points in subspace of dimension di, 
and the geometric quantities of yi,ICi, K.° and Vi can all be 
defined. 

The task in the classification is that given this dictionary A 
where we have an explicit knowledge of the partition { 
we want to find the membership of any other point b that 
lies in the union of subspaces determined by which 

specific subspace it belongs tcH In the work of CD, the 
authors proposed the SRC which finds a sparse representation 
of b as in ([T]i by BP or OMF0. Ideally, the coefficient vector 
X for representing b is subspace-sparse, i.e. is such that the 
nonzero entries of x are all in the set Ji in which i is the 
index of the subspace that b belongs to, so the query b 
can be correctly classified. Other techniques are proposed for 
SRC to robustify the method so that one can classify a point 
when the representation x has nonzero coefficients in two or 
more groups, however, we analyze here the conditions for 
guaranteeing subspace-sparse recovery, which is sufficient for 
SRC to give the correct class label. 

^We assume that any two subspaces intersect only at the origin, so that 
such membership is unique. 

“^While it is proposed to use BP in 03, the idea can be easily extended to 
using OMP. We study both of them. 


First, our result of PRC in Theorem [T] and DRC in Theorem 
ID can be easily applied here for analyzing when a correct 
classification can be guaranteed. Here, we use the DRC result, 
and formulate the following theorem. 

Theorem 13. Given A = assume ||aj ||2 = l,Vaj G 

A subspace classification by BP and OMP succeeds for any 
point b G if 

yi<s{Vi,A\Ai),yi = l,--- ,n, (26) 

in which yi is the covering radius of ±Ai, Vi is the set of dual 
points of Ai, the backslash in A\Ai denotes the set different. 


This theorem asserts that we need the dictionary to have 
well-distributed points in each of the subspaces so that 7 ^ is 
small. Also, the dual points Vi which are in subspace Si need 
to be not too close to points in all other subspaces. 

We can also formulate a randomized result. 


Theorem 14. Suppose there are n subspaces Si with di¬ 
mensions di chosen independently and uniformly at random 
in Suppose that Si points are sampled independently 
and uniformly at random on each of the n subspaces. Let 
Pi ■= Si/di and pi := Sif^jSj be the density of points 
and proportion of point in subspace i, respectively. Then any 
b G can be correctly classified by BP and OMP if 

2 < di < \fi 5 j 2 and pi > 1, i = 1, - ■ ■ ,n, with probability 


p> 1 


/ d, ■ 2^^ 


^.g-C(Z9.d,)Vp- 


d^i2er 

Pt{PrY 


-), (27) 


where ki = ^ — di, and C{D, dfi is a constant as before. 

This result shows that classification based on subspace- 
spares recovery is expected to work if subspace dimension 
is small and ambient dimension is large, and there should be 
enough number of samples in each subspace. 


VI. Related Works and Future Directions 
A. Related works and comparison 

Prior to this work, there has been studies of subspace sparse 
recovery by BP ll38ll . ifTTll and by OMP lITSll in the context of 
subspace clustering. In this section, we compare our results 
with these works by trying to reformulate or applying their 
results to the analysis of the subspace-sparse recovery problem 
considered in this work. 

Theorem 1 in 1381 gives a sufficient condition for the 
correctness of subspace clustering by BP. While the condition 
it gives is in terms of a dictionary composed of several 
subspaces, we can apply it to our problem by taking points 
from one specific subspace as Aq, and all points from all other 
subspaces as Ac- The result is that subspace-sparse recovery 
by BP can be achieved for all b G 5o if the following is true; 

max crdo(Ao)/\/^ > coss(5o,span(.4c)), (28) 

AoGWo 

where Wq is the set of all full column rank submatrices Aq of 
Aq. The LHS of the condition 128] ) is not well interpretable, 
and it is later observed by 03 that the LHS can be bounded 
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as < cos7(Jl For the RHS of ( l28T l, 

one can easily get cos s(5o, span(^c)) > cos s(5o, Ac)- Thus, 
the condition (l28l l is more restrictive than both PRC and DRC. 
Actually, the condition (l28T l may be too restrictive in most 
cases, since the RHS will be equal to 1 (while the LHS is at 
most 1) unless span(,Ac) intersects with the subspace 5o only 
at the origin. 

The deterministic analysis in ini considers a slightly dif¬ 
ferent problem than that of this paper. Concretely, it considers 
the subspace-sparse recovery of a specific b G 5o rather than 
for all points in Sq. It asserts that il@ 

7 o < s(A,{±v}), (29) 

then BP gives subspace-sparse solution for b. In the formula, 
V is the so-called “dual point” (we will see that this "dual 
point” is related to our definition of the set of the dual point 
in Definition |6]l, which is any solution to the program in ( fOl l. 
Notice that v is in iSp by this definition. 

To compare this with our result, we apply it to all possible 
b’s that are in subspace So, and get the condition 

70 < s(^c, ±Vo), (30) 

in which Vq = {v : dual point of b, Vb G 5o}. Thus, equation 
(l30l l is a condition for subspace-sparse recovery for all b G 
Sq, and is now comparable to PRC and DRC. However, the 
structure of Vq is unknown; the best one can do is to take it 
to be (So since the only knowledge about v is that each of 
them is in Sq- By doing this, the condition dSOl l becomes the 
PRC. To further refine this result, one needs to investigate the 
structure of the set 120. It is shown in the proof to Theorem 
|5] that for any b, v can be taken as a point in the set Vq. 
Thus, the set of dual points Vq as defined in Definition |6] is 
composed of all “dual points” v as defined in the work of 
ini. The contribution of our work on the basis of uni can be 
viewed as specifying the structure of the set Vq in (l30l l. 

The above two works are for analysis of BP. In ifTSl . the 
authors give a deterministic condition for guaranteeing correct 
subspace-sparse recovery by OMP. Their condition can be 
formatted to our notation as 

coss(,Ao,,4c) < COS70 

2 _ 

— cos^ 7 o coss(5o, span(,4c)), (31) 

and if this condition holds, then OMP can achieve subspace- 
sparse solution for any b G 5o. The LHS of OTT i characterizes 
the spherical distance between the points in and points in 
Ac, and satisfies our intuition that this distance should be large 
for the purpose of subspace sparse recovery. On the RHS, the 
term coss(iSo, span(^1^)) is the same as that on the RHS of 
(l28l l. and we have argued that this term becomes 1 unless 5o 
and span(„4c) have trivial intersection, making the RHS large 
and the condition difficult to be satisfied. Moreover, it is shown 
recently that OTl l is implied by PRC 1491 . Thus, this condition 
is more restrictive than PRC and DRC. 

shows that the LHS < r(ACo), where r(-) is the inradius. To get 
to the claim, we then use the fact that r(/Co) = cos 70 , which is a trivial 
consequence of Lemma 7.3 in (m and Lemma [T] in this paper. 

®We have used the fact that r(/Co) = cos 70 , see the previous footnote. 


B. Future directions and existing works 

The analysis of this paper assumes that the atoms of the 
dictionary are noise-free. A natural follow-up question is the 
robustness of the result to corruptions on the dictionary A 
and on the signal b. In the context of subspace clustering by 
BP, this problem has already been investigated. Specifically, 
in the works of ll39l and GqI the authors show that with 
different modifications on BP, subspace-sparse recovery based 
clustering is still provably correct. Although this is not a direct 
study of the subspace-sparse recovery of this paper, it shows 
evidence that the BP or its variant is likely to be robust to 
noise. More recently, the work of ll50l introduces the idea of 
approximate subspace-sparse solutions, and shows that under 
certain conditions, the solution is approximately subspace- 
sparse. This gives another promising direction to extend the 
analysis of this paper to noisy case. On the other hand, the 
performance of subspace-sparse recovery by OMP has not 
been studied to the best of our knowledge. However, there are 
results in the study of traditional sparse recovery that show 
the robustness of OMP to noise Q, 0. This also shows the 
possibility of extending OMP for subspace-sparse recovery in 
noisy cases. 

VII. Conclusion 

In this work, we have studied the properties of OMP and BP 
algorithms for the task of subspace-sparse recovery and have 
identified the PRC and DRC as two sufficient conditions for 
guaranteeing subspace-sparse recovery. These two conditions 
reveal that the dictionary atoms within the subspace need to 
be well-distributed, and atoms outside of the subspace need 
to be not too close to the subspace (by PRC) or to the 
set of dual points in the subspace (by DRC). We further 
show that with a random modeling of the dictionary, the 
DRC is expected to hold if subspace dimension is low and 
ambient dimension is high. We have applied our results to the 
analyses of traditional sparse recovery as well as in sparse 
representation based classification. Especially, we have shown 
that our result not only provides guarantees for the correctness 
the sparse recovery problem, but the condition is relaxed than 
that given by mutual coherent. 
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Appendix A 

Proof of lemmas in sectionHII 
A. Proof of Lemma Q] 

Lemma. Assume that ||ai ||2 = 1, Vi G Jo- It has max{||v ||2 : 

V e /eg} = 1/COS70. 

Proof: By the definitions of /Cg and 70 , the conclusion 
of the lemma can be written as 


max ||v ||2 = 1/ min ||A([v||oo, 

||A|}v||oo<l l|v||2 = l 

which can be easily seen as true. 


(32) 


B. Proof of Lemma \2\ 

This lemma is a particular case of a well-known result in 
linear programming. 

Lemma. The set Dq is finite. Specifically, 

card{Vo)<2'^° (33) 

in which Sq = card{Ao). 

Proof: Consider a linear program with variable v, con¬ 
straint V S /eg, and arbitrary objective. Since the dual points 
2 ?o are the extreme points of /eg, they are the same as the basic 
feasible solutions of the linear program Bfil . Assume that the 
index set Jq contains sq elements. Each basic feasible solution 
is determined by do linearly independent constraints from the 
2 • So constraints of ||A([v||oo < 1. Obviously, there are at 
most 2 ^*° • (^“) ways to choose such set of constraints. . ■ 


Appendix B 

Proofs for sectionITv] 
A. Proof for Lemma | 6 | 

Lemma. For any 9 G [0, 7 r/ 2 ] and any p> 2, 


^sirf-h 


< 


pVp Ctp-li^P 

in which Vp is defined in (EB. 


(Tp-l(§g ^(W)) ^ ^ 




0, 


(34) 


Proof: The idea is similar to that in ED. We first prove 
the upper bound. See Figure [3 for an illustration, in which we 
project into any two-dimensional space that contains the 
origin and w. The potion of the area of the spherical cap over 
the entire is the same as the potion of the volume of the 
red dashed cone intersecting with BP{r) over the volume of 
BP[r). Also note that the part of the red cone in the BP{r) 
lie completely in the green dotted cylinder. Thus, 


gp-l(§g ^(w)) 


vol(Cone n i3P(l)) ^ vol(Cylinder) 


vol(BP(l)) 
sin^“^ 9 ■ Vp-i ■ 1 
IP -Vr, 


vol(BP(l)) 


= sin- 


p-l^ Vp-l 


(35) 


this proves the upper bound. 

For the lower bound, consider again the part of the red cone 
in the BP(r), its volume is bounded below by the intersection 
of the red and the cyan cones. It is known that the volume of 



Fig. 3. Illustration for proving bounds for area of spherical cap. 


a p-dimensional cone (i.e. a cone with a p — 1 dimensional 
base) is the product of the p — 1 dimensional area of its base 
and its height divided by p. Thus, one can see that the volume 
of the intersection of the two cones is Vp-i sin^“^ 9 -Yip. The 
conclusion thus follows from this discussion. ■ 


B. Proof for Lemma [7| 

Lemma. The covering number ofSP~^,p > 2 is bounded by 


C{S 


ip-i 


e) < 


■ sin' 




Proof: A standard way of bounding covering number is 
to construct a specific e-covering V. Concretely, initialize V 
as empty. In the first step, add an arbitrary point in into 
V. In the following steps, find any point w in ^ which 
satisfy s(w, V) > e and add this w into 12. The procedure is 
terminated when no such point exists. 

It is easy to see that this procedure must terminate in finite 
number of iterations. In fact, we will provide an upper bound 
on the number of iterations. 

Before that, we first point out that the V constructed in 
this way is an e-covering of or equivalently, 7 (V) < e. 

Otherwise, there would be a w such that s(w, V) > e, and by 
the procedure above, this y should be added to V. Thus, we 
can bound the covering number e) by the cardinality 

of V that we constructed above. 

We now give a bound on card(V). Imagine that centered at 
each point in V we draw a ball (in the space of sf, •))) 

with radius e/2. Then by the construction of V, any two points 
in V are at least e away, so the balls do not intersect with each 
other. Notice that as shown by (1221) . we can bound the area 
measure of these balls, i.e., for any w G V, 

, Vp-i . e 

_1_ /> ^ _ 

cTp_i(SP-i) - pVp 2’ 

the result thus follows by that 


C(SP-\e) < card(V) < 


<Jp-i{^P-^) 
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C. Proof for Theorem \10\ 

Theorem. Let V C §P~^,p > 2 be a set of K points that 
are drawn independently and uniformly at random on 
Then for any 7 * < 7 r/ 2 , it has 7 (±P) < 7 * with probability 
at least 1 —— -^ exp(— 

Proof: Let e = 7 */ 2 , and let V be any e-covering of 
such that card(V) = C(S^’“^,e). Centered at each point 
of V draw a ball with radius of e, then the union of these 
balls covers the entire sphere. The idea of the proof is that if 
each of the balls contain at least one point in the set ±V, then 
the covering radius 7 (±V) is bounded by 2e. This is because 
that for any w S it lies in at least one of the balls, 

and when this ball contains at least one point in ±V, then the 
distance s(w, ±V) is bounded above by 2e. Concretely, denote 
M := card(V) and let Bi,--- ,Bm be the balls illustrated 
above, then 

P (7 > 2e) < P{3i €{!,■■■ ,M} s.t. B,r\±V = 0) 

M 

< ^ n ±iP = 0) 

= - o crp-i{Bi) ^ 

where the factor of 2 appears in the last line because we 
are using symmetrized points ±P. Notice that each B^ is a 
spherical cap of radius e, we can use the result of (l 22 l l to give 
a bound on it. We get 


M 


P(7>2e)<^(l- 


‘2vp—i 

pvp 


sm‘ 




Ooi .. 

< M exp(—X—-— e), (36) 

pVp 

in which M can be further bounded by result of Lemma |7] so 

P _^ ^^2up-i 


P(7 > 2e) < 


^sinP-i f 

v„ 2 


exp(— ^ sin^ ^ e). 

pVp 


This proves the theorem. 


D. Proof for Theorem |2] 

In this section, we finish what is undiscussed in the roadmap 
of proof for Theorem 0 and this will complete the proof. 

Proof: 

The proof is by giving probabilistic bounds on both sides of 
DRC separately and then apply the well known union bound 
to combine the results. In this proof we write d := dp, s := sp 
and fc := fcp to simplify notations. 

For any 7 * < f, the LHS and RHS of DRC are bounded in 
(l24t and (l2Tt . respectively. By applying union bound we get 


P(DSC is satisfied) = P( 7 p < s{'Dq,Ac)) 



Now, we take a special value of 7 * as 
1 ^- 17 * = (£)-0-5^ 


Sin 


VD-l 


or equivalently, 


sin 


d—l „ * 


7- = (5r“=(— 


VD-l 


(38) 


(39) 


and we will argue that such a 7 < § exists at the end of this 
proof 

Define the following for later use: 


C{D.d) = ^, 




d-1 

Trrr 


Vd VD-l 


(40) 


For easier presentation, we take three boxed parts from the 
RHS of dJTl i and provide bounds for them separately, and then 
combine them to get the final result. 

For the first part, we compute 


dvd 


1 


_ 

Vd-i sin'^ ^ ^ Vd-i sin^' ^ 7 * 


= d-2 


d-1 Vd 


Vd-l 




L( ^ 22-1 ^ : 


VD 


d-2^ 

C{D,d) V d 


(41) 


in which we have used the result that sin(2a;) < 2 sin(a;) for 
any x G [ 0 , tt]. 

For the second part, 

2vd-i . d-1 7* . 2ud_i sin‘'"^7* 

s—;-sin —>s — ;- — r-; — 

dvd 2 dvd 2^ 1 


dvd 2d- 

2vd-i d 


-iV s^VD-P 


(42) 


s ^ Vp ^ 


d-1 

17^ 


= C{D,d) 


dvd 2^ ^ V d^VD-i^ '' ^ 'y d 

For the third part, use the fact that (d) — (t)^’ 




VD-1 . D-1 » 

-Sin 7 

Vd 




(43) 


=Ad(2e)'^(-) 


d,S,(d+l)-0.5^^ d.fw 

^d' 


Combining the above three parts into dJTl i we get 


P(DSC is satisfied) > 1 - Xd{2ef{-y 


4^yiexp(C(D,<i)^). (44) 


CiD, 

which is the conclusion in ( fTSl ). 

For the rest part of the proof, we will be needing the 
following result: 


Vp-l 


p+l 


L v' 27 r(p -f 2 ) ’ V 27 r 


p+l 


(45) 
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which is acquired by combining the calculation formula of Vp 
in (l 2 n i and the following result ||5^ : 


Vp + 1 


rtE+ii 

s v^T(|r s 


(46) 


We now show that the 7 * in ( |38] | is well-defined. It boils 
down to showing that the RHS of (|38] | is less than 1. Note 
the first factor is less than or equal to one since s > d. The 
second factor can be upper bounded by (l45l l. i.e. 


vd + 2 ) 

^ . (47) 

vd-1 D + l 

in which the RHS is a decreasing function in D and is less 
than 1 when D = 7. As it is required in the theorem that 
D > 2d^ > 8 , we can conclude that the RHS of (|38] | is less 
than one. 

In the rest part of this proof, we show the properties C{D,d) 
as a function of D and d. First, we show that C{D,d) is 
increasing in D. Compute that 


C{D,d) _/ I’D ^ 
C{D-l,d) ~ Kd-J 


d-1 

xtt 


^VD-l 



> 1 , 


(48) 


where we have used the result (|45]) . and the last inequal¬ 
ity comes from the following observations: Let f{D) = 
(^ ^ One can compute that /(7) > 1, and / 
is an increasing function of D by calculus. Thus, C{D,d) > 
C{D- l,d) if £» > 7. 

Similarly, for showing that C{D,d) is decreasing in d, we 
compute the ratio 


C{D,d) _ 1 Ud-i Vd-i vp 

C{D,d-l) 2 Vd Vd-2 \J VD-i 




(49) 


d + 1 
2d 


(D-l) 



/dT 2 

D + 1 


< 1 , 


in which we have used the result (l45T l. and in the last step we 
use the fact that < 1 when d> 2, and that < 1 

when D > 7. 

Finally, to give a lower bound on C{D,d), we use equation 
gill again and get 


C{D,d) > 


1 d+l / / 27r \:^ 
V2^(d + !yW D + l) 


(50) 


For the RHS, we can have the bound ^ 

Moreover, let g{D) = 0 - 1 , by calculus, one can see 

that g{D) takes minimum when D = 14. Thus 


C{D, d) > 



> 


0.79Vd 
2.07^-! ■ 


(51) 


This finishes all the claims of the theorem. ■ 


Appendix C 

Proof of results in section IV-AI 
A. Proof of Lemma |S] 

Lemma. For Aq which has sq linearly independent atoms, 
the set of dual points, Dq, contains exactly 2 ®° points 
specified by {Ao(A(]^Ao)“^ • u, u G Ugg}, where Ugg '■= 
{[ui, • • • , Usg],Ui = ±1, z = 1, • • • , So}. 

Proof: From Lemma|2] there are possibly at most 2® dual 
points in the case where Aq is of full column rank. So in 
order to prove the result, it is enough to show that the set 
{Ao(A(}Ao)“^ • u,u G Us} contains 2 ® points, and each of 
them is a dual point. 

To show that there are 2® different points, notice that 
Us has 2 ® points, so we are left to show that for any 
Ui,U 2 G Us with Ui f U 2 , it has Ao(Ao Ao)“^Ui f 
Ao(A(}A o)“^U 2 . This can be easily established by notic¬ 
ing that rank(Ao(A(}Ao)“^) = rank(Ao) = s, i.e., 
Ao(A(}Ao)“^ is also of full column rank, so its null space 
contains only the origin. Consequently, if Ao(A[}Ao)“^Ui = 
Ao(A(}A o)“^U 2 , then Ui = U 2 , which is a contradiction. 

Now we show that Ao(A[} Ao)“^Uo is a dual point for any 
Uo G Us- Denote Vq = Ao(A[}Ao)“^Uo. By definition, we 
need to show that Vq is an extreme point of the set /Cg = {v G 
5o : ||A(}v||oo < Ij- First, Vq is in /Cg because ||A(}vo||oo = 

11 Uo I loo = 1- Second, suppose there are two points, Vi,V 2 G 
/eg, such that 

Vo = (1 - A)vi + Av 2 (52) 

for some A G (0,1), we need to show that it must be the case 
that Vi = V 2 . Notice that the columns of Ao(A(}Ao)“^ span 
the space 5o and that Vi,V 2 G /Cg C 5o, there exists Xi,X 2 
such that Vi = Ao(A(} Ao)“^Xi, i = 1, 2. Then by using (l52l i. 
it has 


Ao(AoAo) ^uq 

= (1 — A)Ao(A(}A q) ^xi-f AAo(A(}Ao) ^X2, (53) 

and by left multiplying A(}, we have 

Uo = (1 - A)xi -f Ax 2 . (54) 

Now, consider the equation for each entry separately in (l54l l. 
i.e., [uo]j = (1 — A)[xi]i + A[x 2 ]i, where i indexes an entry 
in the vector. The left hand side, being ±1, is a extreme point 
of the set [— 1 , 1 ], while the right hand side is the convex 
combination of two points in [— 1 , 1 ], so it necessarily has that 
[xi]i = [x 2 ]i. This is true for all entries z, so Xi = X 2 , thus 
Vi = V 2 , which shows that Vo is indeed an extreme point. ■ 
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B. Proof of Theorem 1721 

Theorem. If a dictionary A satisfies niA) ^ 2sq-1’ then for 
any partition of A into Ao and Ac where card(Ao) = So> it 
has that the atoms in Aq are linearly independent and that 
PRC and DRC hold. 


in which 

sp{A) ^ p{A){2s - 1 ) - 1 

1 — (s — 1)pl{A) 1 — (s — l)/i 


thus coss(v4c,5o) < COS 70 , which is the PRC. ■ 


Proof: Suppose pl{A) <1/(2s— 1), we need to show that 
rank(Ao) = s and that PRC and DRC holds. First, the result 
that rank(Ao) = s is well established in studies of sparse 
recovery. We then only need to show that PRC is true, as 
DRC is implied by PRC. 

We start by giving an upper bound on 1 / cos 70 . From 
Lemma | 8 l given any v G /Cg where v 7 ^ 0, it can be written 
as V = Ao(A([Ao)“^u for some u 7 ^ 0 with ||u||oo < 1. 
Thus, 


u^(A([Aoj 


l|v|| 2 =v v = u (AqAo) u<s 

U. U. 

Denote Amax(-)ito be the maximum and minimum 
eigenvalue of a symmetric matrix, respectively. We get 


1 2 < s • max 


u^(AjAo) 


U 540 

— -5 ' Aniax(.^o ^ 


OJ ^ = 


Amin(Aj Aq) 


Notice that A[[Ao is close to an identity matrix, i.e., its 
diagonals are 1 and the magnitude of each off-diagonal entry is 
bounded above by p(A). By using Gersgorin’s disc theorem, 
Amin(A([Ao) > 1 - (S - l)p{A), SO 


|v||B 


s 

1 - (s - l)p{A) ■ 


As a consequence, 1/cos 70 < Lemma [T] 

In the second step, we give an upper bound for the right 
hand side of PRC. By definition, 

coss(Ac,5o)= max ||AJv||oo. 

vSSo, 
l|v||2 = l 


We thus need to bound || Ajv||oo for any v G 5o with ||v ||2 = 
1. Consider the optimization program 


X* = argmin ||x||i s.t. v = Aqx. 

X 

and its dual program 

max(w,v) s.t. ||A([a;||oo < 1- 


The strong duality holds since the primal problem is feasible, 
and the objective of the dual is bounded by ||a;|| 2 ||v ||2 < 
l/cos 7 o. Consequently, it has ||x*||i < 1/cos 70 . This leads 
to 

llAjvIloo = ||A:Aox*||oo < ||A:Ao||oo||x*||i 
< /r(A)/cos7o, 

in which ll-lloo for matrix treats the matrix as a vector. 

Now we combine the results from the above two parts. 

coss(Ac, 5 o) < p{A)/ COS70 

= cos 70 • (p(A)/cos7o) 
sp.{A) 


< COS 70 


1 - (s- l)Ai(A)’ 


Appendix D 

Proof of results in section IV-BI 

Theorem [ 13 ] is a trivial application of the result in theorem 
|2]to all subspaces i = 1, - ■ ■ ,n. 

For Theorem [14] the result is acquired by applying union 
bound. We give more details on this proof since the proba¬ 
bilistic model is not the same as that in Theorem [9] and there 
are certain points that need to be explained and clarified. 
Concretely, let Ei be event that the condition 

< s{Vi,A\Ai) (55) 

is satisfied, 7 = 1, • • • ,n. For a fixed i, the LHS of (l55l l can be 
upper bounded in the same way as in (l24l l by using Theorem 
[IS i.e. 


Phi < 7*) > 1 


dj ■ Vdj 1 

V(di-i) sm(‘^*“L ll 


exp(-s^ ^). 

Vdi 2 


(56) 


For the RHS of ( [55] ), the analysis is similar to that that leads 
to Equation (l27i l. For any point v G and w G A\Ai, we 
observe that both of them have a uniform distribution on the 
unit sphere and that they are independent due to the 

fact that they are from different subspaces. Thus one gets 


\dij VD 

(57) 

By combining these two bounds in the same way as in the 
proof of Theorem [S one get 


P{E,) > 1 


dj • 2 *^* -c{D,di)^fpi ^ji^i di{2eh' 


> 1 - 


d, ■ 2'^' 
C{D,di) 


^.g-C(D,d,)VpT 


Pt{Pih 


By applying union bound. 


P(SRC succeeds) = PifihiEf) > l-^(l-P(Pi)), (59) 

i=l 

one can get the conclusion in ( i27l i. 
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